Understanding Experimental Design 


© 2022, SimBio. All Rights Reserved. 


SECTION 1: WHAT MAKES A GOOD EXPERIMENT? 
Why Do Experiments? 


Scientists spend a lot of time conducting experiments. 
Designing and running a good experiment is 
challenging, time-consuming, and expensive. Is it 
worth the trouble? 


It is. What makes experiments worthwhile is that they 
are the most powerful tool we have for establishing 
cause-and-effect relationships. 


This tutorial will introduce you to the principles of 
good experimental design. Because these principles are 
best learned by carrying out an actual experiment, your 
challenge will be to design, conduct, and interpret your 
own experiment. 


1 


Section 1, p. 


Plight of the Simploids 


Experiments are not just about testing abstract theories. They 
are also used to address practical issues, such as the challenge 
facing the fictional town of Idyllic. Residents of Idyllic need 
help solving a problem that has recently been causing distress 
among its citizens. The crisis involves a sudden disease that is 
threatening the town's population of Simploids, funny little 
creatures that are beloved by the people of Idyllic. 


Play the video to the right to learn more about the plight of 
Simploids in Idyllic. 


If you experience technical issues, click here to see the video in 
a web browser. 


Section 1, p. 2 


How Science Really Works 


To investigate what's happening with the Simploids, you'll 
need to employ "Science". The investigation prompted by the 
plight of the Simploids is typical of many scientific inquiries: a 
problem is discovered, questions are asked, investigation 
ensues. The testing of ideas is an important part, but just one 
part, of the scientific process. 


Undoubtedly you've already covered "the scientific method" at 
some time in your past coursework. Often the process of 
science is presented as a rigid, linear progression from 
observation to hypothesis to experiment to knowledge, but 
sometimes science unfolds very differently. In real-world 
investigations, science is much more dynamic, often 
generating more questions than it answers as scientists and 
communities share results, ideas, and feedback about systems 
that may be undergoing change all the while. 


How Science Works 


Exploration 
Testing 
Ideas 

la N 


Benefits & 4 SS Community 
Outcomes Analysis & 
Feedback 


> C 


Reset 


Adapted from University of California, 2015, with permission. 


Click Next on the right to view the slideshow depicting how science works in practice, emphasizing the 


aspects covered in this tutorial: exploration and testing ideas. 


Well-Designed Experiments 


The process of science includes both observational 
studies as well as experiments. Observational 
studies can suggest associations between two 
variables (such as eating a Mediterranean diet and 
longevity), but they cannot establish that change in 
one variable causes the change in the other—the 
association may exist because both variables are 
affected by a third factor that was not observed. 
Only with well-designed experiments, which 
systematically vary the hypothesized causal 
variable (diet) and measure the response of interest 
(longevity), can researchers infer causality. 


To understand how well-designed experiments can 
help establish cause-and-effect relationships, 


consider an example. 


Section 1, p. 3 


Does listening to classical music improve test performance? 


Imagine that two of your friends are in a class together, and they just got their exams back. One of them 


listened to classical music while studying for the test and earned a high grade, while the other listened to her 
standard nonclassical playlist and scored poorly on the test. The classical-music-listening friend claims that 
this is clear evidence that listening to Bach, Mozart and Beethoven improves test performance. Consider: 
What's wrong with this claim? 


Q1.1. Why should your friend NOT attribute the difference in test scores to study music? 


(Question contents not available to print.) 


The question above points to two types of logical error in your friend's claim: 


1. Alack of systematic variation. Many conditions other than the music your friends listened to may 
have impacted their test performance. 

2. Limits to the scope of inference. Individuals are inherently variable. Therefore, a comparison 
between two students does not tell you whether the pattern would be true for other students. To make a 
general claim about the impact of study music on test performance, you would compare multiple 
students’ performances on multiple tests. 


Section 1, p. 4 


Systematic Variation: Mind Your Variables! 


You probably already have a sense that experiments involve varying the potentially causal factor you're 
interested in testing (such as study music), and then making comparisons (such as comparing test 
performance). To make an inference about one factor influencing the other, you want to change only the 
hypothesized causal factor between the groups you are comparing, and hold all other factors constant (where 
possible). 


A well-designed experiment includes: 


e Independent, or treatment, variables: one or more likely causal variables that you manipulate. For each 
independent variable, there are two basic categories of treatment: 
1. Experimental group: a group that has been subjected to a hypothesized causal factor. 
2. Control group: a group that has the "baseline" condition, providing a comparison to 
experimental groups, to assess the effect of treatment. 
e Potentially confounding variables: other variables that are held constant between groups. 
e Dependent, or outcome, variables: appropriate measures of the treatment's effect. 
You may have heard the word "control" being used in two different ways by scientists discussing experiments. 
There is the control group that allows the causal effect to be measured. But scientists also refer to variables 
that they control—by holding constant—between groups. These are potentially confounding variables. This 
tutorial uses the word control only in the context of the control group or treatment. 


Q1.2. Suppose you decide to design an experiment to test your friend's hypothesis about 
classical music and studying. How will you define your variables? 


(Question contents not available to print.) 


Q1.3. In the experiment you're designing, what would you call the group of people that listen to 
classical music while studying? 


(Question contents not available to print.) 
In some experiments, the same subjects are used for both the experimental group and the control group. This 
might be done, for example, to reduce or eliminate the effect of potentially confounding variables that are 


likely to differ greatly among subjects. In this case, the experimental design specifies the experimental 
treatment and the control treatment, rather than groups. 


Section 1, p. 5 


Scope of Inference: What Will Your Conclusion Mean in the Real World? 


Experiments help us uncover potential cause-and-effect relationships underlying a general pattern, so you 
want your conclusions to be as generalizable as possible. This is what we mean by scope of inference—can we 
say something about the world in general or just a specific circumstance? 


Even with systematic variation, the scope of inference can be limited. Scientists have devised several 
strategies to address this challenge. First, remember your Mozart-loving friend making a general claim from a 
comparison between just two students: Her claim is clearly flawed. To demonstrate a general pattern, you 
must include multiple individuals in both the classical music treatment (the experimental group) and the 
control group. That is, you replicate the control or experimental treatment on many students. These are called 
replicates. 


Replication in an experiment is achieved by applying a treatment to multiple experimental units. In our 
example of study music and test performance, the experimental unit is an individual student. However, an 
experimental unit does not have to be an individual organism. It could be part of an individual, such as a cell, 
or it could be more broadly encompassing, such as a Petri dish containing colonies of bacteria, an aquarium 
with many fish, a field of corn, a forest, or a whole island. The appropriate experimental unit depends on your 


question and what inference you want to draw—is it about an individual, a population, or a community? 


An experiment to test the effect of study music on test performance could use experimental units that are narrowly 
defined, such as an individual student (left); more broadly scaled, such as a population of students in a class (middle); 
or even more broadly encompassing, such as a school (right). 


Section 1, p. 6 


Whatever the experimental unit, there is always natural variation among them. To increase confidence in 
your results, the best strategy is to assign experimental units randomly to groups when possible, so that 
groups are less likely to differ by something that could confound results. Think of the students studying for a 
test. You would not want to put all the 'A’ students into the classical music group! 


Replication is essential. Experiments are subject not only to inherent variation among experimental units but 
also to chance factors such as weather, environmental conditions, measurement error, etc. Using replicates 
increases the chance that differences you observe among groups or treatments are due to experimental 


manipulation and not some other confounding factor. 


Another strategy that can help increase scope of inference is repeating the experiment. This involves 
reproducing the entire experiment, including all the treatment groups and all the replicates within 


treatments. 
Test your understanding of replication and scope of inference with the following example. 


Some college campuses have started using "puppy therapy" to help reduce stress among students, according 
to a Psychology Today article. Suppose you design an experiment to test the hypothesis that puppy therapy is 
an effective stress reducer for students, within the context of a 6-section introductory biology course at your 
school. Each section has 12 students. Students in three of the sections get to play together with a pack of 


therapy dogs for 15 minutes at the beginning of the 
section recitation (or "lab") meeting. The other three 
sections simply observe 15 minutes of quiet free time 
instead. Students rate their stress levels immediately 
after the 15-minute period. Then you measure the 
percentage of students that report being "very stressed 
out" for each section, and you compare the average 
percentage between sections that played with puppies 
and those that didn't. 


Q1.4. In your puppy therapy experiment, what 
is the experimental unit ? 


(Question contents not available to print.) 


Q1.5. In your puppy therapy experiment, how many replicates are in the experimental group? 


(Question contents not available to print.) 


Q1.6. Which of the following could help improve the scope of inference for your results? 


(Question contents not available to print.) 


Section 1, p. 7 


Ared crossbill perched on a pine branch. 
Photo: iStock.com / Frank Leung 


Crossbills and Pine Cone Spines 


Designing experiments can be especially daunting for student scientists new to the task. Studying good 
examples helps. Before you are asked to devise your own experiment on Simploids, spend the next few pages 
exploring a well-designed experiment by New Mexico State University graduate student Kimberly Coffey, her 
advisor, Craig Benkman, and another NMSU faculty member, Brook Milligan. This experiment was motivated 
by observations on crossbills and the pine trees whose seeds they eat. 


Crossbills (genus Loxia) are finches named for their unusual crossed bills. The birds use their crossed bills to 
break into pine cones (genus Pinus) and liberate the developing seeds within. To witness some of these 
burglars in action, watch this video from the BBC: 


BBC video from "Life of Birds" showing crossbills eating ponderosa pine seeds. 


An intensely spined cone of the Table 


Mountain pine, whose seeds are eaten by 
crossbills. Note this cone is closed, so it looks 
different than open cones such as those on the 
ground under a tree. 


Photo: Elektryczne jabłko. Licensed under CC BY- 
SA 3.0. 


Many pines have sharp spines on the woody scales of their cones. What are these spines for? 


Coffey and colleagues hypothesized, as have other biologists, that the spines are adaptive because they make 
it harder for crossbills and other seed predators to steal pine nuts. The researchers reasoned that if this 
predator-deterrence hypothesis were correct, then cones with spines would be harder for crossbills to raid than 
cones without spines. 


To check this prediction, the biologists clipped the spines off all the scales on some pine cones, and left other 


pine cones intact. They then measured how fast crossbills can extract seeds from the spineless versus spiny 


cones. 1 


On the following page, click Next to start watching the animated summary of the researchers’ clever 
experiment. Explore all the steps before continuing. 


1 Coffey, K., C.W. Benkman, and B.G. Milligan. 1999. The adaptive significance of spines on pine cones. Ecology 80(4):1221-1229. 


Section 1, p. 8 


Section 1: What Makes a Good Experiment? , 
= © Understanding Experimental Design © 9/16 > Q 4 NOTES Z QUESTIONS f 


To watch a version of the video with captions, « 


Section 1, p. 9 


Q1.7. The pale yellow boxes below describe four elements necessary for systematic variation in an experiment. How 
were these elements incorporated into the crossbill and pine cone experiment? 


Drag labels to targets in order to correctly answer each question. Click Check Answer for feedback. Incorrectly matched labels will fall out of 
place. 


Independent Control Treatment Experimental Dependent Variable 


Variable For the independent Treatment What did the scientists 
variable, what was the measure? 


: JA What was the 
baseline condition? 


manipulated condition? 


What likely causal factor 
did the scientists 
manipulate? 


Reset 


Q1.7. The pale yellow boxes below describe four elements necessary for systematic variation in 
an experiment. How were these elements incorporated into the crossbill and pine cone 


experiment? 


(Question contents not available to print.) 


Section 1, p. 10 


Replication in the Crossbill Experiment 


One way Coffey and her colleagues included replication in their 
experiment was to measure the effect of the cone spine removal 
treatment using seven different birds. They did not expect the 
outcome to be identical for each bird. Individuals vary, and if 
they saw (or did not see) an effect with a single bird, they 
wouldn't know if the outcome was real or a fluke. By using 
replication, they reduced uncertainty and improved the 
generality of their results. 


g 
© 
E 
e 
fe 
= 
a 
© 
E 
S 
ire 
c 
G 
2 
= 


The birds in their experiment provide a compelling example of 
why replication is essential. 


Click Bird A on the right to see its data. 


Spines Present Spines Removed 


Observe the Average Difference displayed to the right of the 


graph. This summary measure, derived from the dependent 
variable (average time for a bird to eat a seed, or foraging 
time), is calculated by subtracting each bird's foraging time for spine-free cones from its foraging time for 
spiny cones. 


Q1.8. Based only on Bird A's results, what is the average foraging time difference between 
cones with spines present and cones with no spines? 


(Question contents not available to print.) 


Note: You can click a data point on any graph to see its precise value. 


Section 1, p. 11 


Now look at data from all seven birds to see if the overall pattern is similar to the conclusion based only on 
Bird A. 


Click each of the other birds (B-G) to see each individual's result in the graph. As you add results to the 
graph, the average time difference, in seconds, between the two cone types (spines present versus spines 
removed) is updated. 


Q1.9. Based on results from all seven birds, what is the average foraging time difference 
between cones with spines present and cones with no spines? 


(Question contents not available to print.) 


Q1.10. How would the researchers' experiment have been different if they had used only data 
from Bird A (and not all 7 birds)? 


(Question contents not available to print.) 


You may have deduced that Coffey and colleagues’ experiment 
included more than one level of replication. Click the link to 
discover more. 


Mean Foraging Time (s) 


Spines Present Spines Removed 


Section 1, p. 12 


Summary: Why Do Pine Cones Have Spines? 


Because their experiment 12 
on pine cone spines and 
crossbill seed predation 
was so carefully 
designed, Coffey and 
colleagues’ results were 
clear. They found that, 
despite variation in 
feeding rates among Spines Present Spines Absent 
birds, crossbills on average required more time to steal 


Average Foraging Time (s) 


Average number of seconds required for a crossbill 
seeds from cones with spines versus cones lacking spines. to extract and eat a seed from open Table Mountain 
pine cones with (left) and without (right) spines. 
Error bars are +1 standard error. Adapted from 
seconds per seed. Coffey et al. 1999. 


For Table Mountain pine, the difference was about 2 


The graph on the right summarizes the researchers' data. 
It shows the average foraging time (that is, the number of seconds it takes to steal a seed) for crossbills 
foraging on open Table Mountain pine cones with spines present (left) and absent (right). 


Q1.11. Based on the results shown, what could Coffey and her colleagues conclude about their 
hypothesis that cone spines benefit pine trees because they make it harder for crossbills to 
steal pine seeds? 


(Question contents not available to print.) 


1 Coffey, K., C.W. Benkman, and B.G. Milligan. 1999. The adaptive significance of spines on pine cones. Ecology 80(4):1221-1229. 


Section 1, p. 13 


Check Your Understanding 


Any good experiment should relate as directly as possible to the hypothesis you set out to test. The 
independent variable, the dependent variable, the control group, and even the time course of the experiment 
(whether seconds or decades) must be chosen so that your experimental results can support, or refute, your 
hypothesis. 


Thinking through how to best design an experiment to test the hypothesis you're interested in can be tricky. 
For more practice in this kind of thinking, answer the questions below and on the next page. 


Q1.12. Consider this alternative experiment intended to test the hypothesis tested by Coffey 
and colleagues, that spines on pine cones impede seed predation by crossbills: Alter the bills of 
several crossbills to remove their cone-opening advantage, and then measure the time 
required for these birds to eat a certain number of seeds from cones with spines. Then 
compare the average time to that from birds with unaltered bills. 


Is this experiment an appropriate test of the hypothesis, and why (or why not)? 


(Question contents not available to print.) 


The drastic decrease in populations of honeybees, and 
specifically the sudden disappearance of most bees from entire 
hives (called Colony Collapse Disorder or CCD) has caused great 
concern among many farmers because bees are important crop 
pollinators. Many causes have been proposed for CCD, including 
pathogens (disease-causing microbes) and pesticides, but as yet 


biologists haven't agreed on a single cause. One hypothesis is 
that the electromagnetic field (EMF) produced by man-made 
structures like power lines and cellphone towers disrupt bee behavior and cause them to abandon their hives. 


Q1.13. Suppose you want to set up an experiment to test the hypothesis that electromagnetic 
fields cause CCD, and you have 6 hives to work with. Which of these is the best experimental 
setup? 


(Question contents not available to print.) 


Q1.14. Suppose you set up the experiment suggested in the previous question and find that the 
colony collapse rate is slightly higher among hives exposed to EMF than among hives with no 
exposure. What can you conclude about the hypothesis? 


(Question contents not available to print.) 


Note: While the cause of colony collapse disorder is still under active investigation, in the real world there is 
more experimental support for the hypothesis that certain insecticides cause CCD than there is for 
electromagnetic fields as the culprit. Click here to read about a well-designed study linking insecticide 
exposure to CCD. 


1 Lu, C., K.M. Warchol, and R.A. Callahan. 2014. Sub-lethal exposure to neonicotinoids impaired honey bees winterization before 
proceeding to colony collapse disorder. Bulletin of Insectology 67(1): 125-130. 


Section 1, p. 14 


Check Your Understanding (continued) 


Q1.15. A good experiment must include: 


(Question contents not available to print.) 


Q1.16. Some researchers have hypothesized that reading fiction can increase empathy in 
children. What prediction follows from this hypothesis? 


(Question contents not available to print.) 


Q1.17. Suppose your friends decide to test the hypothesis that reading fiction can increase 
empathy in children. They randomly choose two children from a local school. They give one 
child a nonfiction book to read, and the other child a fiction book to read in the same amount of 
time. Before and after the subjects have read their books, your friends give each child a 
standard psychological test for empathy. 


Which of the following changes would make your friends' experiment better, and why? 


(Question contents not available to print.) 


Section 1, p. 15 


Check Your Understanding (continued) 


Your college has a small farm where they raise vegetables for sale and for 
use in the college cafeterias. Students have requested that the farm 
switch from using commercial fertilizer to using organic compost. The 
farm managers have said they are willing to make the switch if you can 
show that the compost is just as effective as the fertilizer in producing the 
same yield of vegetables. They have provided several plots of tomato 
plants for you to use to conduct an experiment to test the hypothesis that 


compost is as effective as fertilizer for tomato production. 
Q1.18. Which experimental setup below is the most appropriate for testing the hypothesis? 


(Question contents not available to print.) 


Q1.19. Which of the following measurements (dependent variables) would most appropriately 
test the hypothesis? 


(Question contents not available to print.) 


In the next section, you'll use what you've learned from these examples of experimental design to try your 
hand at designing and conducting an experiment to find the source of the Simploids' plight in Idyllic. 


Section 1, p. 16 


